Skip to main content

Membership Inference

Overview

The goal of the Membership Inference attack is to determine whether specific data records can be inferred to be a part of the model’s training dataset. The attack is conducted by simulating an attacker with access to the model and a dataset, of which some records were used to train the model. We simulate the attacker building a classifier that predicts whether a data record was part of the training dataset based on the loss calculated for each data record using the model. The performance of this classifier indicates the vulnerability of the model to membership inference attacks, a proxy for how private the model is. Note: membership inference tests cannot be run for closed source API endpoint models.

Metrics

True Positive Rate: The true positive rate represents the percentage of data records correctly predicted to be members of the training dataset. We look at the true positive rate at a variety of low false positive rates to determine the attacker’s success in high-confidence scenarios.

ROC-AUC: The Receiver Operating Characteristic (ROC) curve measures the performance of the attack as a tradeoff between the True Positive Rate (TPR) and False Positive Rate (FPR) at various thresholds. We can then use the Area Under the ROC Curve (AUC) to measure the aggregate performance across all thresholds.

Walkthrough Example

The membership inference attack follows a threat model that assumes the attacker has a classifier trained to predict whether a given data record is a part of the model’s training set.

Model Input — Example Data Record: John, As discussed, the AIG exposure is $10B USD, and it is distributed among the price, option, and exotic books.

Model Output: The model will either classify the data record to as in the training set or to not in the training set. We then determine whether this is a true positive or false positive based on:

 Classifier Result: in the training setClassifier Result: not in the training set
True Membership: in the training setTrue PositiveFalse Negative
True Membership: not in the training setFalse PositiveTrue Negative

Attacker success is then represented as the trade-off between the true positive and false positive rate. Intuitively, an attacker with a high true positive rate, while maintaining the false positive rate low, indicates a powerful classifier — and represents a high vulnerability to membership inference attacks.